Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config by qgallouedec · Pull Request #5638 · huggingface/trl

qgallouedec · 2026-04-24T20:48:55Z

What does this PR do?

On top of #5637

before:

  attention_bias                                   True                               → False
  eos_token_id                                     [151329, 151336, 151338]           → None
  first_k_dense_replace                            3                                  → 1
  head_dim                                         128                                → <missing>
  hidden_size                                      5120                               → 8
  intermediate_size                                12288                              → 32
  moe_intermediate_size                            1536                               → 1408
  n_routed_experts                                 160                                → 4
  num_attention_heads                              96                                 → 4
  num_experts_per_tok                              8                                  → 2
  num_hidden_layers                                92                                 → 2
  num_key_value_heads                              8                                  → 2
  num_nextn_predict_layers                         1                                  → <missing>
  pad_token_id                                     151329                             → None
  rope_theta                                       1000000                            → 10000.0
  routed_scaling_factor                            2.5                                → 1.0
  use_qk_norm                                      True                               → False
  vocab_size                                       151552                             → 151365

after

[config_diff] zai-org/GLM-4.5 vs tiny (10 differences)
  first_k_dense_replace                            3                                  → 1
  head_dim                                         128                                → 2
  hidden_size                                      5120                               → 8
  intermediate_size                                12288                              → 32
  moe_intermediate_size                            1536                               → 32
  n_routed_experts                                 160                                → 4
  num_attention_heads                              96                                 → 4
  num_experts_per_tok                              8                                  → 2
  num_hidden_layers                                92                                 → 2
  num_key_value_heads                              8                                  → 2

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Note

Low Risk
Low risk: only updates the tiny-model generation script’s Glm4MoeConfig constants (no runtime/library code changes), but could affect downstream consumers expecting the previous tiny config.

Overview
Updates the GLM-4.5 tiny-model generation script to stop deriving vocab_size from the tokenizer and instead hardcode it, while adding several missing GLM-4.5-aligned config fields (e.g., moe_intermediate_size, head_dim, attention/eos/pad IDs, RoPE theta, scaling, QK norm, and next-token prediction layers).

This makes the generated tiny checkpoint’s config closer to the upstream reference, reducing config diffs when running print_config_diff and pushing the tiny model to the hub.

^{Reviewed by Cursor Bugbot for commit 1961d87. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-04-24T20:51:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 540502a8d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 49d5fca. Configure here.}

…tion

… locals (#5681)

…tion

qgallouedec added 14 commits April 24, 2026 18:30

New tiny model generation

730c876

cohere and fix vocab size

a060e6d

print pr

158b891

precommit

f5eedfb

precommit

ffbf3b1

cohere2

d24a76c

deepseek v3

f0f5563

revert to keep this focused

59cb16e

nit

9bc6ad4

revert

a7ad64a

revove force and update readme

6b361e1

nit commit message

b2cf603

better

b4bae78

Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config

540502a

cursor Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread tests/conftest.py Outdated

fix generation config peft

0b7fa20

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread tests/conftest.py Outdated

qgallouedec added 2 commits April 24, 2026 16:53

Merge branch 'main' into new-tiny-model-generation

538f486

Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe

49d5fca

cursor Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread scripts/generate_tiny_models/for_causal_lm/glm4_moe_for_causal_lm.py

qgallouedec and others added 9 commits April 27, 2026 16:01

Qwen3.6 integration (#5642)

39bafd4

Release: v1.3 (#5647)

07e65d7

⬆️ Bump dev version (#5648)

7198c14

Add Qwen3.6 model generation script with updated configuration

71b8219

merge main

545e5e9

Merge remote-tracking branch 'origin/main' into new-tiny-model-genera…

db13f29

…tion

Merge branch 'main' into new-tiny-model-generation

5cc7fc8

Merge remote-tracking branch 'origin/main' into new-tiny-model-genera…

7f25397

…tion

Qwen3 Instruct-2507

4730fec

cmpatino and others added 13 commits May 3, 2026 21:20

Upload testing suite for DistillationTrainer (#5615)

b231373

Fix OOM in CI by reducing batch size in VLM SFT tests (#5687)

abb98ac

Fix OOM in CI test reruns due to GPU memory leak from traceback frame…

9f19b4a

… locals (#5681)

Add training-invariance tests (#5686)

3f56be7

Regenerate invariance data + relax the tolerance (#5688)

d232332

Merge remote-tracking branch 'origin/main' into new-tiny-model-genera…

3cde729

…tion

Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe

e05a232

gemma3

bd5e693

Merge branch 'main' into new-tiny-model-generation

9e3b6d6

Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe

6366461

fix conftest

3bc2a57

Merge branch 'main' into new-tiny-model-generation

4c5ac17

Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe

8967867

Base automatically changed from new-tiny-model-generation to main May 5, 2026 15:47

qgallouedec and others added 8 commits May 5, 2026 15:53

Merge remote-tracking branch 'origin/main' into fix-tiny-glm4-moe

6ba6f64

Skip GLM4 model tests for transformers version < 5.0.0

9a76c3d

Merge branch 'main' into fix-tiny-glm4-moe

5d8ba77

Merge branch 'main' into fix-tiny-glm4-moe

e513313

Merge branch 'main' into fix-tiny-glm4-moe

40f2916

fix

2d7c42a

Merge branch 'main' into fix-tiny-glm4-moe

c4d64d1

Merge branch 'main' into fix-tiny-glm4-moe

baa5e0e

qgallouedec requested review from AmineDiro, albertvillanova and kashif May 12, 2026 14:36

qgallouedec and others added 3 commits May 14, 2026 19:23

Merge branch 'main' into fix-tiny-glm4-moe

7e7e558

style

995f20b

revert conftest

1961d87

qgallouedec merged commit 7c3af3d into main May 15, 2026
6 of 13 checks passed

qgallouedec deleted the fix-tiny-glm4-moe branch May 15, 2026 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config#5638

Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config#5638
qgallouedec merged 57 commits into
mainfrom
fix-tiny-glm4-moe

qgallouedec commented Apr 24, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

qgallouedec commented Apr 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

AI writing disclosure

Who can review?

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qgallouedec commented Apr 24, 2026 •

edited by cursor Bot

Loading